2024-11-19
A statistical technique for evaluating the performance and generalizability of machine learning models.
Divides dataset into training and validation subsets.
Ensures model training on one subset and validation on another.
Provides more reliable estimates of model performance.
Reduces bias compared to a single train-test split.
Improves model generalizability by leveraging different training and validation data.
K-Fold Cross-Validation: Dataset is split into k folds, model is trained on k-1 folds and validated on the remaining fold, repeated k times Kohavi (1995).
Leave-One-Out Cross-Validation (LOOCV): A special case of K-Fold where k equals the number of observations, each sample serves as the validation set once.
Nested Cross-Validation: Used for model selection and hyperparameter tuning, an outer loop for validation and an inner loop for training and hyperparameter optimization.
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
where y_i is the actual value, \hat{y}_i is the predicted value,and n is the total number of observations.
RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
where y_i is the observed value, \hat{y}_i is the predicted value,and n is the total number of observations.
R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} where \bar{y} is the mean of the actual values.
Divide the dataset (D) into (K) equally sized subsets (folds).
For each fold (k) (where (k = 1, 2, , K)):
The overall performance metric is then averaged over all (K) folds:
\text{CV}(M) = \frac{1}{K} \sum_{k=1}^{K} P_k
Divide the dataset (D) into (n) subsets (where (n) is the number of observations).
For each observation (i) (where (i = 1, 2, , n)):
The overall performance metric is then averaged over all (n) observations:
\text{LOOCV}(M) = \frac{1}{n} \sum_{i=1}^{n} P_i
Outer Loop:
Inner Loop:
Performance Metrics:
\text{Inner CV}(M) = \frac{1}{M} \sum_{j=1}^{M} P_{kj}
\text{Nested CV}(M) = \frac{1}{K} \sum_{k=1}^{K} \text{Inner CV}(M)
Martensite Starting Temperature
Application
Untransformed Model: Directly modeled Ms using predictors like C, Mn, Ni, Si, Cr, with interaction terms.
Log-Transformed Model: Modeled log(Ms) to handle non-normality and stabilize variance, using the same predictors and interaction terms.
Model Improvements (Predictors’ Removal, Introducing Interaction Parameters, Outliers’ Removal)
Model Diagnostics (ANOVA, AIC, Cross-Validation, Check for Multicollinearity, Influential Points’ Removal)
Model Evaluation
Ms = 769.41 -286.71 C -16.42 Mn -14.04 Ni - 13.89 Si - 10.13Cr -41.45C:Mn - 8.36 C:Ni
| Variables | Mean ± SD | Correlation Coefficient | P-value |
|---|---|---|---|
| C | 0.36 ± 0.1 | -286.71 | < 2e-16 |
| Mn | 0.79 ± 0.3 | -16.42 | 1.36E-13 |
| Ni | 1.55 ± 0.5 | -14.04 | < 2e-16 |
| Si | 0.35 ± 0.2 | -13.89 | 1.70E-13 |
| Cr | 1.04 ± 0.7 | -10.13 | < 2e-16 |
| C:Mn | N/A | -41.45 | < 2e-16 |
| C:Ni | N/A | -8.36 | 9.68E-10 |
Examples:
For every 1% increase in C, Ms decreases by 286.71C, holding all other factors constant.
The combined presence of C and Mn decreases Ms by an additional 41.45 units for every 1% increase in both.
log(Ms) = -6.69 - 0.51C - 0.03 Mn - 0.03 Ni - 0.03 Si - 0.02Cr - 0.07 C:Mn - 0.01C:Ni
| Variables | Mean ± SD | Correlation Coefficient | P-value |
|---|---|---|---|
| C | 0.36 ± 0.1 | -0.51 | < 2e-16 |
| Mn | 0.79 ± 0.3 | -0.032 | < 2e-16 |
| Ni | 1.55 ± 0.5 | -0.0255 | < 2e-16 |
| Si | 0.35 ± 0.2 | -0.0226 | 4.48E-13 |
| Cr | 1.04 ± 0.7 | -0.0175 | < 2e-16 |
| C:Mn | N/A | -0.0751 | < 2e-16 |
| C:Ni | N/A | -0.0154 | 1.01E-11 |
Examples:
For every 1% increase in C, Ms is multiplied by e^{-0.51}=0.6. This means Ms decreases by approximately 40% for every 1% increase in C.
The combined effect of C and Mn leads to an additional decrease e^{-0.07}=0.93, or a 7% decrease in Ms for every 1% increase in both C and Mn.
| Model | Cross-Validation | RMSE | MAE | R² |
|---|---|---|---|---|
| First Model | 5-Fold | 27.79 | 20.43 | 0.90 |
| Second Model | 5-Fold | 0.05 | 18.28 | 0.91 |
| First Model | LOOCV | 27.80 | 20.43 | 0.90 |
| Second Model | LOOCV | 0.05 | 22.02 | 0.91 |
5-Fold: Offers slightly lower MAE and RMSE, indicating better stability when training on subsets of the data.
LOOCV: Is slightly more sensitive to data variations but confirms consistent results with 5-Fold.
In our study, we analyzed a dataset from Wentzien et al. (2024) Martensite dataset focuses on predicting the Martensite Start Temperature (Ms) in steel alloys based on their chemical compositions.
Correlation_Matrix
Linear regression is a fundamental statistical technique that establishes a relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.
In our dataset, which focuses on predicting the Martensite Start Temperature (Ms) of steel based on its chemical composition (C, Mn, Si, Cr, Ni), linear regression allows us to quantify how changes in these elements influence Ms.
M_s = \beta_0 +\beta_1 C +\beta_2 Mn + \beta_3 Si + \beta_4 Cr + \beta_5 Ni
M_s = 746.99 - 254.85 C - 24.24 Mn - 13.28 Si - 7.8 Cr - 14.64 Ni
| Term | Estimate | Std_Error | t_value | p_value |
|---|---|---|---|---|
| (Intercept) | 746.99268 | 4.0289613 | 185.405771 | 0.0000000 |
| C | -254.85890 | 5.7347802 | -44.440919 | 0.0000000 |
| Mn | -24.24356 | 2.5175491 | -9.629826 | 0.0000000 |
| Si | -13.28195 | 3.6933099 | -3.596218 | 0.0003357 |
| Cr | -7.82620 | 0.7366216 | -10.624451 | 0.0000000 |
| Ni | -14.64102 | 0.2895086 | -50.571976 | 0.0000000 |
Linear Regression Coefficients
Statistics
Residual standard error: 54.28 on 1230 degrees of freedom
Multiple R-squared: 0.7433,
Adjusted R-squared: 0.7422
F-statistic: 712.2 on 5 and 1230 DF,
p-value: < 2.2e-16
The results of this analysis reveal that the models tested with 5-Fold Cross-Validation (5-Fold CV) and Leave-One-Out Cross-Validation (LOOCV) demonstrate impressive predictive accuracy. These two models consistently outshine the Nested CV model, indicating that they are more dependable for making predictions from the dataset.
| Measure_of_Error | Result_Value |
|---|---|
| RMSE | 48.27 |
| MAE | 32.28 |
| R2 | 0.81 |
K Fold Cross-Validation
| Measure_of_Error | Result_Value |
|---|---|
| RMSE | 48.27 |
| MAE | 32.28 |
| R2 | 0.81 |
LOOCV Cross-Validation
| Measure_of_Error | Result_Value |
|---|---|
| RMSE | 53.28 |
| MAE | 33.46 |
| R2 | 0.75 |
Nested Cross-Validation
Support Vector Machines (SVM) are algorithms that model data by finding optimal boundaries, handling nonlinear patterns using kernels. Using our dataset, Support Vector Machines (SVM) with a radial kernel help predict the martensite start temperature (Ms) based on chemical elements like C, Mn, Ni, Si, and Cr. SVM works by finding the best way to capture the relationship between these variables, effectively handling complex patterns for accurate Ms predictions.
| Measure_of_Error | Result_Value |
|---|---|
| RMSE | 35.93 |
| MAE | 20.98 |
| R2 | 0.90 |
SVM_K Fold Cross-Validation
| Measure_of_Error | Result_Value |
|---|---|
| RMSE | 52.61 |
| MAE | 28.49 |
| R2 | 0.79 |
SVM_LOOCV Cross-Validation
| Measure_of_Error | Result_Value |
|---|---|
| RMSE | 40.09 |
| MAE | 22.24 |
| R2 | 0.86 |
SVM_Nested Cross-Validation
In this study, we compared the performance of Linear Regression and Support Vector Machine (SVM) models in predicting the martensite start temperature (Ms). Using 5-fold cross-validation, we found that SVM outperformed Linear Regression, achieving a lower MAE (~21 vs ~33), a higher R² (~0.9 vs ~0.55), and a lower RMSE (~25 vs ~48), highlighting its superior accuracy and reliability in making predictions.
| Method | Measure_of_Error | Linear_Regression | SVM |
|---|---|---|---|
| 5-Fold | RMSE | 48.27 | 35.93 |
| 5-Fold | MAE | 32.28 | 20.98 |
| 5-Fold | R2 | 0.81 | 0.90 |
| LOOCV | RMSE | 48.27 | 52.61 |
| LOOCV | MAE | 32.28 | 28.49 |
| LOOCV | R2 | 0.81 | 0.79 |
| Nested CV | RMSE | 53.28 | 40.09 |
| Nested CV | MAE | 33.46 | 22.24 |
| Nested CV | R2 | 0.75 | 0.86 |
Model Comparision Results
Model Comparision Results Plot
Linear Regression Model
Support Vector Machine Model
k-fold Cross-validation
Leave-one-out Cross-validation (LOOCV)
Nested Cross-validation
Mean Absolute Error (MAE): SVM performed much better, with a lower MAE (~21) compared to Linear Regression (~33), meaning SVM’s predictions were closer to the actual values.
R-squared (R²): SVM showed a significantly higher R² (~0.9) than Linear Regression (~0.55), indicating that SVM explained 90% of the data’s variability, while Linear Regression only accounted for about 55%.
Root Mean Squared Error (RMSE): SVM had a much lower RMSE (~25) compared to Linear Regression (~48), which reflects its greater accuracy and fewer large prediction errors.
Overall Performance: Across all key metrics, SVM outperformed Linear Regression, proving to be a more accurate and reliable model for predicting martensite start temperature (Ms).